Producing Public-use Microdata That Are Analytically Valid and Confidential

نویسنده

  • William E. Winkler
چکیده

A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more records are likely to be re-identified via modern record linkage methods than via the re-identification methods typically used in the confidentiality literature. This paper compares several masking methods in terms of their ability to produce analytically valid, confidential microdata.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata

Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau. A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same...

متن کامل

Seeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes

(2001). Seeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes. Public concern about personal privacy has recently fo-cused on issues of Internet data security and personal information as big business. The scientific discourse about information privacy focuses on the crosspres-sures of maintaining conf...

متن کامل

NORC Data Enclave

Launched in 2006, the NORC Data Enclave provides a confidential, protected environment within which authorized researchers can access sensitive microdata remotely. While public-use data can be disseminated in a variety of ways, fewer options exist for sharing sensitive microdata that have not been fully de-identified for public use. Some data producers have sufficient economies of scale to deve...

متن کامل

Combining synthetic data with subsampling to create public use microdata files for large scale surveys

To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemi...

متن کامل

Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information

T he ability to collect and disseminate individually identifiable microdata is becoming increasingly important in a number of arenas. This is especially true in health care and national security, where this data is considered vital for a number of public health and safety initiatives. In some cases legislation has been used to establish some standards for limiting the collection of and access t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997